Position: Data Engineer (PySpark)
Duration: Full Time
Location - Irving, TX (onsite)
Pay -130-140k+benefits
Job Description
Big Data (PySpark) Tech Lead–
• 10+ Years Overall Experience in Data Management, Data Lake and Data Warehouse
• 8+ Years Hadoop, Hive, Sqoop, SQL, Teradata
• 8+ Years PySpark(Python and Spark), Unix
• Good to have Industry leading ETL experience
• Banking Domain experience
Key Responsibilities
• Ability to design, build and unit test applications on Spark framework on Python.
• Build PySpark based applications for both batch and streaming requirements, which will require in-depth knowledge on majority of Hadoop and NoSQL databases as well.
• Develop and execute data pipeline testing processes and validate business rules and policies
• Optimize performance of the built Spark applications in Hadoop using configurations around Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
• Optimize performance for data access requirements by choosing the appropriate native Hadoop file formats (Avro, Parquet, ORC etc) and compression codec respectively.
• Ability to design & build real-time applications using Apache Kafka & Spark Streaming
• Build integrated solutions leveraging Unix shell scripting, RDBMS, Hive, HDFS File System, HDFS File Types, HDFS compression codec.
• Build data tokenization libraries and integrate with Hive & Spark for column-level obfuscation
• Experience in processing large amounts of structured and unstructured data, including integrating data from multiple sources.
• Create and maintain integration and regression testing framework on Jenkins integrated with BitBucket and/or GIT repositories
• Participate in the agile development process, and document and communicate issues and bugs relative to data standards in scrum meetings
• Work collaboratively with onsite and offshore team.
• Develop & review technical documentation for artifacts delivered.
• Ability to solve complex data-driven scenarios and triage towards defects and production issues
• Ability to learn-unlearn-relearn concepts with an open and analytical mindset
• Participate in code release and production deployment.
• Challenge and inspire team members to achieve business results in a fast paced and quickly changing environment